Materials on GitHub

@annakrystalli | a.krystalli[at]sheffield.ac.uk




Literate programming

Programming paradigm first introduced by Donald E. Knuth.

Treat program as a literature understandable to human beings

  • move away from writing programs in the manner and order imposed by the computer

  • focus instead on the logic and flow of human thought and understanding

  • single document to integrate data analysis (executable code) with textual documentation, linking data, code, and text



Why is this important in science:

Computational science has led to exciting new developments

  • Increasing data collection throughput; data are more complex and highdimensional

  • Existing databases can be merged to become bigger databases

  • Computing power allows more sophisticated analyses, even on “small” data

  • For every field “X” there is a “Computational X”


Increased computational complexity has exposed limitations in our ability to evaluate published findings

  • Even basic analyses difficult to describe

  • Errors more easily introduced into long analysis pipelines

  • Knowledge transfer is inhibited

  • Results are difficult to replicate or reproduce

  • Complicated analyses cannot be trusted


Calls for reproducibility


Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

  • Fully scripted analyses workflows
  • Publication of code and data available



Calls for open science

… highlight problems with users jumping straight into software implementations of methods (e.g. in r) that may lack documentation on biases and assumptions that are mentioned in the original papers.

To help solve these problems, we make a number of suggestions including providing blog posts or videos to explain new methods in less technical terms, encouraging reproducibility and code sharing, making wiki-style pages summarising the literature on popular methods, more careful consideration and testing of whether a method is appropriate for a given question/data set, increased collaboration, and a shift from publishing purely novel methods to publishing improvements to existing methods and ways of detecting biases or testing model fit. Many of these points are applicable across methods in ecology and evolution, not just phylogenetic comparative methods.



Science and the web

the web was made for open science !

Literate programming in R

rmarkdown (.Rmd) integrates:

– a documentantion language (.md)

– a programming language (R)

Combine tools, processes and outputs into interactive evidence streams that are easily shareable, particularly through the web.



Rmarkdown overview

Features

Rstudio features fly through

What is R Markdown? from RStudio, Inc. on Vimeo.



The researchers perspective

a reproducible workflow in action



elements of R markdown


markdown .md

stripped down html. User can focus on communicating & disseminating


  • intended to be as easy-to-read and easy-to-write as possible.
  • intended for one purpose: to be used as a format for writing for the web.
  • clean and legible across platforms and outputs
  • syntax is very small, corresponding only to a very small subset of HTML tags.
  • formatting handled automatically
  • html markup also handled.

code {r, python, SQL, … }

  • Code chunks defined through special notation. Executed in sequence. Exceution of individual chunks controllable

  • Analysis self-contained and reproducible
    • Run in a fresh R session every time document is knit.
  • A number of Language Engines are supported by knitr
    • R (default)
    • Python
    • SQL
    • Bash
    • Rcpp
    • Stan
    • JavaScript
    • CSS
  • Can read appropriately annotated .R scripts in and call them within an .Rmd


outputs

Knit together through package knitr to

Many great packages and applications build on rmarkdown.

All this makes it incredibly versatile. Check out the gallery


Simple interface to powerful modern web technologies and libraries

Applications in research

Rmd documents

Can be useful for a number of research related materials

  • Vignettes: long form documentation.
    • Analyses
    • Documentation (code & data)
    • Supplementary materials
  • Reports
  • Papers

Useful features: - bibliographies and citations

bookdown

Authoring with R Markdown. Offers:

  • cross-references,
  • citations,
  • HTML widgets and Shiny apps,
  • tables of content and section numbering

The publication can be exported to HTML, PDF, and e-books (e.g. EPUB) Can even be used to write thesis!


pkgdown

For buidling package documentation

  • Can use it to document any functional code you produce and demonstrate it’s us ethrough vignettes


workflowr pkg

Build analyses websites and organise your project

The workflowr R package makes it easier for researchers to organize their projects and share their results with colleagues.


blogdown

For creating and mantaining blogs.

Check out https://awesome-blogdown.com/, a curated list of awesome #rstats blogs in blogdown for inspiration!


Let’s have a look

open your first .Rmd!!

File > New File > RMarkdown… > Document


save and render it

Render an .Rmd document by clicking on the knit button.

You can also render .Rmd documents to html using rmarkdown function render()

rmarkdown::render(input = "render-this-doc.Rmd")

open the cheatsheet

install the packages we’ll need

install.packages(c("rmarkdown", "tidyverse", "plotly", "DT", "reprex"))



YAML header

define outputs


basic html_document

output: html_document


define a floating table of contents

output: 
  html_document:
    toc: true
    toc_float: true


choose a theme

Specify bootswatch themes.

output: 
  html_document:
    toc: true
    toc_float: true
    theme: cosmo


Can also override with custom .css

output: 
  html_document:
    toc: true
    toc_float: true
    theme: cosmo
    css: assets/css/my-theme.css

choose code highlights

output: 
  html_document:
    toc: true
    toc_float: true
    theme: cosmo
    highlights: zenburn



Markdown basics



text

    normal text

normal text

    *italic text*

italic text

    **bold text**

bold text

    ***bold italic text***

bold italic text

headers

rmarkdown

# Header 1
## Header 2
### Header 3
#### Header 4
##### Header 5
###### Header 6

rendered html


unordered lists

rmarkdown

- first item in the list
- second item in list
- third item in list

rendered html

  • first item in the list
  • second item in list
  • third item in list

ordered lists

rmarkdown

1. first item in the list
1. second item in list
1. third item in list

rendered html

  1. first item in the list
  2. second item in list
  3. third item in list

quotes

rmarkdown

> this text will be quoted

rendered html

this text will be quoted


code

annotate code inline

rmarkdown

`this text will appear as code` inline

rendered html

this text will appear as code inline



evaluate r code inline

a <- 10

rmarkdown

the value of parameter *a* is `r a`

rendered html

the value of parameter a is 10



images

rmarkdown

![](assets/cheat.png)

rendered html


resize images

html in rmarkdown

<img src="assets/cheat.png" width="200px" />

rendered html


basic tables in markdown

rmarkdown


    Table Header  | Second Header
    ------------- | -------------
    Cell 1        | Cell 2
    Cell 3        | Cell 4 

rendered html

Table Header Second Header
Cell 1 Cell 2
Cell 3 Cell 4

Check out handy online .md table converter



html in rmarkdown

marking up with html tags

This text marked up in html

<strong>Bold text</strong>

renders to this

Bold text


**This text marked up with Bootstrap alert css classes

<div class="alert alert-warning"><small>this a is warning message</small></div>

renders to

this a is warning message


<div class="alert alert-success"><small>this a is success message</small></div>

renders to

this a is success message

embedding tweets

This snipped copied from twitter in the embed format

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">How cool does this tweet look embedded in <a href="https://twitter.com/hashtag/rmarkdown?src=hash&amp;ref_src=twsrc%5Etfw">#rmarkdown</a>! 😎</p>&mdash; annakrystalli (@annakrystalli) <a href="https://twitter.com/annakrystalli/status/977209749958791168?ref_src=twsrc%5Etfw">March 23, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

renders to this

Embbed gifs, videos, widgets in this way


Mathematical Expressions

Supports mathematical notations through MathJax.

You can write LaTeX math expressions inside a pair of dollar signs, e.g. $\alpha+\beta$ renders \(\alpha+\beta\). You can use the display style with double dollar signs:

$$\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$$

\[\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i\]


Chunks


R code chunks execute code.

They can also be used as a means render R output into documents or to simply display code for illustration (eg with option eval=FALSE)


chunk notation

chunk notation in .rmd

```{r chunk-name}
print('hello world!')
```

rendered html code and output

print("hello world!")
## [1] "hello world!"

Chunks can be labelled with chunk names, names must be unique.


chunk options

for more details see http://yihui.name/knitr/


uses

  • controlling whether code is displayed inline (echo setting)
  • controlling whether code is evaluated (eval setting)
  • controlling how figures are displayed (fig.width and fig.height settings)
  • suppressing warnings and messages (warning and message settings)
  • cacheing computations (cache setting)
  • controlling whether code is extracted when using purl (purl settings)

controlling code display with echo

chunk notation in .rmd

```{r hide-code, echo=FALSE}
print('hello world!')
```

rendered html code and output

## [1] "hello world!"

controlling code evaluation with eval

chunk notation in .rmd

```{r dont-eval, eval=FALSE}
print('hello world!')
```

rendered html code and output

print("hello world!")

setting document level default options

knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)

reading chunks of code (R -> Rmd)

You can read in chunks of code from an annotated .R (or any other language) script using knitr::read_chunks()

Chunks are defined by the following notation

# ---- descriptive-chunk-name1 ----
code("you want to run as a chunk")

# ---- descriptive-chunk-name1 ----
code("you want to run as a chunk")

code in .R script hello-world.R

hello-world.R

# ---- demo-read_chunk ----
print("hello world")


read chunks from hello-world.R

knitr::read_chunk("hello-world.R")



call chunk by name

rmarkdown r chunk notation

```{r demo-read_chunk}

```

rendered html code and output

print("hello world")
## [1] "hello world"



Check chunks in the current session

knitr:::knit_code$get()
## $`demo-read_chunk`
## [1] "print(\"hello world\")"

Extracting code from an .Rmd (Rmd -> R)

You can use knitr::purl() to tangle code out of an Rmd into an .R script. purl takes many of the same arguments as knit(). The most important additional argument is:

  • documentation: an integer specifying the level of documentation to go the tangled script:
    • 0 means pure code (discard all text chunks)
    • 1 (default) means add the chunk headers to code
    • 2 means add all text chunks to code as roxygen comments
purl("file-to-extract-code-from.Rmd", documentation = 0)

extract using purl

Here i’m running a loop to extract the code in demo-rmd.Rmd for each documentation level

file <- "demo-rmd.Rmd"
for (docu in 0:2) {
    knitr::purl(file, output = paste0(gsub(".Rmd", "", file), "_", docu, ".R"), 
        documentation = docu, quiet = T)
}

demo-rmd_0.R

knitr::opts_chunk$set(echo = TRUE)
summary(cars)
plot(pressure)

demo-rmd_1.R

## ----setup, include=FALSE------------------------------------------------
knitr::opts_chunk$set(echo = TRUE)

## ----cars----------------------------------------------------------------
summary(cars)

## ----pressure, echo=FALSE------------------------------------------------
plot(pressure)

demo-rmd_2.R

#' ---
#' title: "Untitled"
#' author: "Anna Krystalli"
#' date: "3/23/2018"
#' output:
#'   html_document:
#'     toc: true
#'     toc_float: true
#'     theme: cosmo
#'     highlight: textmate
#' 
#' ---
#' 
## ----setup, include=FALSE------------------------------------------------
knitr::opts_chunk$set(echo = TRUE)

#' 
#' ## R Markdown
#' 
#' 
#' This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
#' 
#' When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
#' 
## ----cars----------------------------------------------------------------
summary(cars)

#' 
#' ## Including Plots
#' 
#' You can also embed plots, for example:
#' 
## ----pressure, echo=FALSE------------------------------------------------
plot(pressure)

#' 
#' Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Displaying data

printing data.frames

data(airquality)
head(airquality)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6

printing tibbles

library(tibble)
as_tibble(airquality)
## # A tibble: 153 x 6
##    Ozone Solar.R  Wind  Temp Month   Day
##    <int>   <int> <dbl> <int> <int> <int>
##  1    41     190  7.40    67     5     1
##  2    36     118  8.00    72     5     2
##  3    12     149 12.6     74     5     3
##  4    18     313 11.5     62     5     4
##  5    NA      NA 14.3     56     5     5
##  6    28      NA 14.9     66     5     6
##  7    23     299  8.60    65     5     7
##  8    19      99 13.8     59     5     8
##  9     8      19 20.1     61     5     9
## 10    NA     194  8.60    69     5    10
## # ... with 143 more rows

knitr::kable() tables

library(knitr)
data(airquality)
kable(head(airquality), caption = "New York Air Quality Measurements")
New York Air Quality Measurements
Ozone Solar.R Wind Temp Month Day
41 190 7.4 67 5 1
36 118 8.0 72 5 2
12 149 12.6 74 5 3
18 313 11.5 62 5 4
NA NA 14.3 56 5 5
28 NA 14.9 66 5 6



plots

set.seed(100)
d <- diamonds[sample(nrow(diamonds), 1000), ]

p <- ggplot(data = d, aes(x = carat, y = price)) + geom_point(aes(text = paste("Clarity:", 
    clarity)), size = 1) + geom_smooth(aes(colour = cut, fill = cut)) + facet_wrap(~cut)

p



interactivity


DT::datatable() tables

library(DT)
data(airquality)
datatable(airquality, caption = "New York Air Quality Measurements")



plotly plots

Wraps nicely around plotting library ggplot2

library(plotly)

ggplotly(p)



shiny

Shiny allows you to build interactive apps and dash boards through R and publish them through a free shinyapps account.

Exercise


your mission

create your first .Rmd!

  • choose some data eg:
    • datasets package
    • data(package = .packages(all.available = TRUE))
    • A dataset of your own
  • show us some data in a table
  • plot some data
  • write a bit about what you did
  • publish it on rpubs. Add you link to our googledoc

See my example: beavers!


Parting words


Getting help with markdown

To get help, you need a reproducible example

  • github issues
  • stackoverflow
  • slack channels
  • discussion boards

reprex

Use function reprex::reprex() to produce a reproducible example in a custom markdown format for the venue of your choice

  • "gh" for GitHub (default)
  • "so" for StackOverflow,
  • "r" or "R" for a runnable R script, with commented output interleaved.

using reprex

  1. Copy the code you want to run.
    all required variables must be defined and libraries loaded
  2. In the console, call the reprex function

    reprex::reprex()
    • the code is executed in a fresh environment and “code + commented output” is returned invisibly on the clipboard.
  3. Paste the result in the venue of your choice.
    • Once published it will be rendered to html.

Learn about Version Control

Use Git and GitHub to manage, publish and collaborate on your work

See Happy Git with R Tutorial


Share your work

  • Start a blog!
  • Work openly

Keep learning with others